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Abstract 

\ Most metrics between finite point measures currently used in the literature have the 

flaw that they do not treat differing total masses in an adequate manner for applications. 
This paper introduces a new metric d\ that combines positional differences of points under 
a closest match with the relative difference in total mass in a way that fixes this flaw. A 
comprehensive collection of theoretical results about d\ and its induced Wasserstein metric 
d,2 for point process distributions are given, including examples of useful di-Lipschitz contin- 
uous functions, di upper bounds for Poisson process approximation, and di upper and lower 
bounds between distributions of point processes of i.i.d. points. Furthermore, we present 
a statistical test for multiple point pattern data that demonstrates the potential of d\ in 
applications. 
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^ ' 1 Introduction 
O 

The two metrics most widely used on the space 91 of finite point measures on a compact 
metric space (X, do ) are th e Prohorov metric g and the metric d\ that was introduced in 



Barbour and Brownl (|1992al ). We use 5 X to stand for the Dirac measure at x. For £ 



YT=i , V = ^Vi G anci rf o < 1 the metric d\ is given by 

1 n 

di(€>v) '■= rnhi -y2do( x hy*(i)) ( L1 ) 



if m = n > 1 and rj) := 1 if m / n, where Tl n denotes the set of permutations of 
{1, 2, ... , n}. The gap between d\ =: df± and gAl=: c^ 00 ' can be b ridged by metr i cs d^ where 



the average in (II. ip is replaced by a general p-th order average (see ISchuhmacheii . l2007bl ) . 
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All of these metrics are good choices from a theoretical point of view, because they metrize 
the natural vague topology on 91. Furthermore, especially d\ has been highly successful as 
an underlying metric for defining a Wasserstein metric d 2 between point process distributions: 
letting T 2 := {/ : 91 -> [0,1]; |/(£) - /(r/)| < di(£,7/) for all £, ■q G 01} , we set 



sup 



fdP-fdQ 



X2) 



ric have been obtained; included amongst them are the results of 


Barbour and Brown ( 


1992a 




Brown and Xia 


( 


1995a 




Brown and Xia ( 


2001 




Barbour and Mansson 


( 


2002i). 


Chen and Xia 


(2004). 


Schuhmachei (2007a). and 


Schuhmacher 


(2007b). which for the most part assume that 



one of the probability measures involved is a Poisson (or compound Poisson) process distribu- 
tion. Such estimates can be used to compare the distributions of point pattern statistics S(E), 
where S E J- 2 , for different underlying poi nt process models, si nce the Wasserstein distance 
(JS?(S(S)),J&? (S(E'))) (see pp. 254-255 of lBarbour et al.l (jl992h ) is easi ly seen to be bounded 



by d 2 (=5f (H), (H')) . For a concrete example where this was exploited, see ISchuhmacherl ( 2005b 
Section 3.2). 

However, there are certain limitations with respect to the practical applications of the met- 
ric d\ (as well as of the other metrics between point measures that were mentioned), which are 
mainly due to the fact that di(^,r/) is always set to the maximal distance 1 if the total num- 
bers of points of the point patterns £ and r/ disagree. Such crude treatment results in a metric 
that does usually not reflect very well our intuitive idea of two point patterns being "far apart" 
from one another if the cardinalities of the point patterns are different, as can be seen from the 
extreme case illustrated in Figure 11.11 This flaw is, in our opinion, the main reason why such 
metrics have not been taken up in more application-oriented fields, such as spatial statistics. 

In the present article we introduce a new metric d%, which refines the metric d\ in the sense 
that di(£,r/) = cii(£, 77) if the cardinalities of the two point patterns £ and r\ agree, but d\ (£,77) 
can take general values in (0, 1] if the cardinalities disagree. In particular, d\ assigns a large 
distance if the difference of the numbers of points is large compared to the total number of 
points in the point pattern with more points and it takes the quality of point matchings into 
account even if the total numbers are not the same. 

While d\ is a slightly weaker metric than d%, it still metrizes the same topology as d\, and 
its induced Wasserstein metric d 2 still metrizes convergence in distribution of point processes 
and provides an upper bound for the Wasserstein distance dw (-^ (S (^)) , ^ (S^H')) for many of 
the useful point pattern statistics S that d 2 does. As far as Poisson process approximation is 
concerned, we are able to obtain better bounds in the ^-metric than in the stronger efo-metric 
for a wide range of situations. We furthermore present a simulation study that assesses the 
powers of certain tests based on d\ and demonstrates its usefulness in spatial statistics. 



2 Definition and elementary properties 

Let {X,do) be a compact metric space with do < 1, on which we always consider the Borel 
cr-algebra B. Denote the space of all finite point measures on X by 01 and equip it as usual 
with the vague topology and the cr-algebra M generated by this topology, wh ich is the smallest 
cr-algebra that renders the point counts on measurable sets measurable (see iKallenbergi . Il986l . 
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Figure 1.1: The left is a realization of 99 independent and uniformly distributed points and the right is 
the same as the left except an additional point is added. Intuitively, we would say both point patterns 
are very similar. However, the c?i-distance between the two is maximal, whereas the <ii-distance is only 
0.01 (out of a possible range of [0,1]) 

Section 1.1, Lemma 4.1, and Section 15.7). Recall that a point process is just a random element 



In essence, we arrange for £ and r\ to have the same number of points by introducing extra points 
located at distance 1 from X , and then take the average distance between the points under a 
closest match (which is the <ii-distance). 

Proposition 2. A. The map d\ is a metric that is bounded by 1. 

The proof of this proposition, as well as further proofs that are of a more technical nature 
and would otherwise disrupt the flow of the main text can be found in the appendix. It is 
convenient to introduce the "relative difference metric" dn on Z+, which is given by dn(m, n) := 
\m — n\l max(m, n) for max(m, n) > 0. The triangle inequality for dji follows immediately from 
the triangle inequality for d\, because we have dji(m,n) = di(mS x ,nS x ). 

Proposition 2.B. The following statements about d\ hold. 



(ii) d\ metrizes the vague (=weak) topology on 91; 

(in) The metric space (91, d±) is locally compact, complete, and separable. 

We next define the metric di on the space *P(91) of probability distributions on (91, A/") just 
as the Wasserstein metric with respect to d\. 



Definition. Let T 2 := {/ : 91 ->■ [0, 1] ; - f(rj)\ < d^^n) for all £,r) € 91}. Set then 



of 91. 



Definition. Let d\ be the symmetric map 91 2 — > M + that is given by 




for £ = J2T=i $x t ,V = E"=i 6 Vj G W with n > max(m, 1), and di(0, 0) := 0. 



(i) d R (\H\M) < d^v) < d^v) for all^rte 91; 




for P,Q g <p(91). 
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Since this is exactly the Wasserstein construction (the fact that we restrict the functions in 
T 2 to be [0, l]-valued has no influence on the supremum, because the underlying di-metric 
is bounded by 1), it is clear that d 2 is a metric that is obviously bounded by 1, and we 
can easily derive basic properties. For two probability distributions [i and v on Z+, write 
whiM ~ fi,N ~v Ed^(Af, iV) , which is the Wasserstein distance with respect to dpi 
(compare property (i) below). 

Proposition 2.C. The metric d 2 satisfies 

(i) d 2 (P,Q) = min^p E d\ (3, H) for all P, Q G ©((R); 

(ii) d w (j^(|3|),J^(|H|)) < <fe(jSf(E),JSf(H)) < d 2 (i?(3),i?(H)) /or any point processes 3 
and H; 

(mj c?2 metrizes the weak topology on ^3(9T), so i/ia£ 3 n — ► 3 iff d 2 (Jz? (3 ra ), ££ (3)) — > 0. 



3 Lipschitz continuous functions 

By the definition of d 2 , upper bounds for a distance d 2 (j2?(E), Jzf(H)) also bound the difference 
|E/(3) — E/(H)| for any / G T 2 . It is thus of considerable interest for the application of 
estimates such as those obtained in Section 2] to have a certain supply of "meaningful" d\- 
Lipschitz continuous statistics of point patterns (where we do not worry too much about the 
Lipschitz constant as it will only appear as an additional factor in the upper bo und). 



For the di-metric, a selection of such statist i cs was given in Section 10.2 of iBarbour et al 



( 1992 ) and in Subsection 3.3.1 of ISchuhmacherl (|2005al ). Since d\ is in general strictly smaller 



than di, we cannot reasonably expect all of these functions to lie in T 2 . However, we are able 
to recover many of the most important examples, which is illustrated by the two propositions 
below. This is mainly due to the fact that these functions take all the points in the pattern into 
account without fundamentally distinguishing how many there are, which is a situation where 
a di-Lipschitz condition typically provides too much room in the upper bound. 

Our first proposition concerns certain [/-statistics with Lipschitz continuous kernels (the 
former are usually considered for a fixed number of points, but the extension is obvious). See 



Led (jl990l ) for detailed results about such statistics. 



Proposition 3. A. Suppose that y D X and extend the metric do to y in such a way that it is 
still bounded by 1. Fix I G N := {1, 2, . . .} and write 9T i+ := G 91; |f | > I}. Let K : y l -> [0, 1] 
be a symmetric function that satisfies 

(i) \K{ux, ...,ui)- K(vi, . . . ,vi)\ < \ Y!i=i d (ui,Vi) for all m, . . . , ui, vi, . . . , vi G y; 

(ii) for every JVgN there are un € y such that for any k G {1, . . . , 1} and any selection 
1 < i\ < . . . < ik < N of k indices 

Kiuiijtiiz,. . . ,u ik ,u k +i,u k+2 , . . . ,ui) > K(m,u 2 , . . . ,u k ,u k+ i,u k+2 , ■■■ ,ui) 

for all ui,u 2 , . . . ,ui G X ; 
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(Hi) for every k G {1, . . . , 1} we have 



K(ui,u\, . . . ,u 1 ,u k+1 ,u k +2, • ••,««) < K(u 1 ,u 2l . . . ,U k ,U k+ l,U k+2 , ...,ui) 
for all u\,U2, ■ ■ ■ ,Ui G X. 
Define f : % + - [0, 1] by 

/(£) : = -pr X] ^(a;^,^,...,^) (3.1) 

/or £ = Y2iLi ^ ^ wif/i m> I. Then there exists an extension F of f to the whole of VI such 
that F G T 2 - 

One possible choice for the function K in the above result is half the interpoint distance, i.e. 
K(u\,U2) = 7}do(ui,U2) for all u\,U2 G X. If X C M D =: 3^ for some fl £ N and do(x,y) = 
\x—y\Al for all x, y G R , we can consider more generally the diameter of the minimal bounding 
ball, defining 

K(u\, . . . , Ui) := — minjdiamo(-B); B C M. D closed Euclidean ball with ui, G £?} 

for Z > 2 and «i, . . . , Ui G M D , where diamo(-B) := sup{do(x, y); x, y G -B}. It can be shown that 
this yields again a function that satisfies (i)-(iii). 

The second proposition looks at the average nearest neighbor distance in a finite point pattern 
on W D . This statistic gives important information about the amount of clustering in the pattern. 

Proposition 3.B. Let X <ZMP , and do{x,y) = \x — y\ Al for all x,y G M. D . Define the function 
f:m 2+ ^ [0, 1] by 

^ 771 

f(0 '■= — Y] r min , d (xi,Xj) 
m ^ je{i,...,m} 

for £ = Yll=i $xi G 9t with m > 2. Then there exists an extension F of f to the whole of 91 
that is di-Lipschitz continuous with constant td + 1, where td denotes the kissing number in 
D dimensions (i.e. the maximal number of unit balls that can touch a unit ball in M. D without 
producing any overlaps of the interiors; see Conway and Sloane )(199&) . Section 1.2, for details). 

Proof of Proposition \3.A[ Fix a point xo G X and define F(£') := f(£' + (I — \£,'\) + o~x ) f° r every 
£' G 9T. It suffices to show that — f(rj)\ < di(£,r)) for £,77 G 91 with |£|, |?7| > /, because this 
implies that 

|F(0 - F(n')\ = \f{? + (l- \t'\) + 5 X0 ) -f{ v '+(i- W\) + s xo ) I 
< di{e + Q- KD+^o, v' + (i- W\) + Sx ) 

<di(?,r/) 

for every £',r/ G 91. Let then £ = X^^=i an d ^ = SILi where m,n> I and without loss 
of generality m < n (because of the symmetry of the inequality that we would like to show). 
We add n — m points x m+ i, . . . , x n to £ in one of the following two ways depending on whether 
f(0 > Hv) or /(£) < fir,), and call the result £ := £™ =1 <W 
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If /(O — f(v)i let %m+r '■= u r , 1 < r < n — m, for points U\, . . . ,u n - m chosen as in 
assumption (ii) with N = n — m. It follows that 

/(f) = 7n\ E K{x ix ,...,x k ) 

vi/ l<ti<...<i ; <n 

1 ? 

= /mWn-m\ E E •K'C^ii > • • • ? 

2-ij=Q\.j)\l—j) j=0 l<u<...<ij<m 

m+1 <ij <...<£; <n 

>i E «-(x <1 ,...,s il ) = /(0. (3.2) 

v Z ' l<ii<...<i(<m 

The inequality is a consequence of the fact that (X^=o a i)/ (Sj=o^i) — a l/^l ^ a j/bj > 

for every j; and the latter condition holds because for max(0,Z — n + m) < j < I — 1 (since 

ay = bj = if j < I — n + rn, these pairs can be ignored altogether), 

K( x iu ■ ■ ■ i x ii) 



(m\ /n—m\ 
\j)\l-3 > 



3 I l<ii<...<ij<m 
m+l<ijj r \<...<ii<n 

— 7m\ /n-m\ E /m-jA E ^(^il ) • • • > x ij J ^fj+i > • • • > X n , 

l<ii<...<ij<m l<rj +1 <...<r l <m 
m+l<i j+1 <...<i l <n {r J - + i,...,rj}n{ti,...,i 3 -}=0 



1 1 



j / l<ii <...<£,■ <m l<rj+x<...<ri<Tn 



{r J+1 ,...,r i }n{ii,...,ij}=0 
l<i\<...<ii<m 
v i / l<ii<...<ij<m 

where the inequality follows by assumption (ii) and the symmetry of K. 

If on the other hand /(£) < /(??), let x m + r := #i, 1 < r < n — m. It follows in exactly the 
same way as for the first case, only this time with ">" replaced by "<" and using assumption (iii) 
instead of assumption (ii), that /(£) < /(£). 

In total, we thus obtain 

|/(0 - f(v)\ < |/(0 - f(v)\ < di&v) = di(^v) < dl(Z,v), 



where the second ineq 
in Proposition 2. A of 



uality follows from th e cq-Lipschitz continuity of the functions considered 



Schuhmacherl ()2007bl ). □ 



Proof of Proposition \3.B\ Fix arbitrary ao,«i G [0,1] and define F{£) := an if |£| = i € {0,1} 
and F(0 = /(0 otherwise. Let £ = YliLi ^ an d ^ = J27=i ^yti where without loss of generality 
we assume m < n. Since |-F(0 — ^( r ?)| < 1 < (td + 1)| < (td + l)Ji(^,ry) if m G {0, 1} and 
n > m, the Lipschitz inequality remains to be shown for n > m > 2 only. 

As before, we bring the cardinalities to the same level. Let £ := X^=i ^xa where the points 
x m+ i, . . . , x n are chosen in the following way. If /(0 > /( r ?)> let ^m+i, • • • > be arbitrary 
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pairwise distinct points in M. D that are at do-distance 1 from each other and from X. Hence 
/(£) > /(£) because for each of the added points the distance to its nearest neighbor is one, 
which is maximal. If on the other hand /(£) < f(rj), let x m +i := ... := x n := x±, whence it 
is immediately clear that /(£) < /(£) because for each of the added points the distance to its 
nearest neighbor is zero. 
In total, we obtain 

|/(0 - f(v)\ < |/(0 - f(v)\ < (td + l)di(f,T/) = (t d + 1K(£,7 ? ) < (td + 1)^(^,7?), 



where the second inequality follows from the d-\ - Lipschitz cont i nuity of the average nearest 
neighbor distance considered in Proposition 2.C of ISchuhmacherl (|2007bl ). □ 



4 Distance estimates in d 2 



In this section we present upper bounds for some essential ^-distances, which all clearly improve 
on the bounds that are available for the corresponding ^-distances. However, the improvement 
in general results is not always as much as one would hope it to be, and it seems that considerably 
better bounds can be obtained by a more specialized treatment (see for example Subsection [ 



4.1 Poisson process approximation of a general point process 

Using the fact that 



M(0 = / [h(£ + S, 



x 



h{$] X(da) + / [h(£ - 5 a ) 
Jx 



(4.1) 



is the generator of the spatial immigration-de ath process whose ste a dy sta te distribution is the 
Poisson process with expectation measure A, iBarbour and Brownl (|1992al ) establish the Stein 
identity for Poisson process approximation as 



^(0 = /(0-Po(A)(/) 
for suitable test functions / on DT. The solution for (|4.2p is given by 



hf(0 



[E/(Z,(t))-Po(A)(/)]dt, 



(4.2) 



(4.3) 



where is an immigration-death process with generator A and initial point pattern 2^(0) = £. 

Using ()4.2p and different characteristics of point processes, we can e s tablish various Pois- 
son pro c ess approximat i on err or bo unds fseelBarbour and Brownl (|1992al ). IBarbour and Brown 
(|1992bl ). IBarbour et al.l ()1998l ). and lChen and Xial feOM) 1 ). To keep our text concise, we present 
here a slightly simplified version of the main result in Ichen and Xial (2004) only; it is an obvi- 
ous exercise to apply our estimates (|4.4|) and (|4.5p to get parallel results in the other articles 
mentioned above. 

We assume that, for each a S X, there is a Borel set A a C X such that a E A a and the 
mapping 

X x m ^ X x m : (a,0 ^ (a,Z\Ac) 
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is pro duct measurable, where S\a% stands for the point pattern of £ restricted to A c a (IKallenberg) . 
19861 . Secti on 1.1). Such require ment can be ensured by A = {(x, y); y G A x , x G X} measurable 
in X 2 (see lChen and Xial . 12004 ). We define, for any function h on 9t, that 



Ah(0 := sup|/i(£ + * a )-/i(£)|, 

a&X 

A 2 h(£) := sup \h{rj + 5 a + 5p) - h{v + S a ) - h{v + Sp) + h(rj)\, £ G VI. 

ri-^e%a,l3eX 



Theorem 4. A (jChen and Xial . 120041 ). For each bounded measurable function f : — > M+, let 
hf be the solution (|4.3|) of Equation (|4.2|) . If 'E is a point process on X with expectation measure 
X, then 

|E/(E)-Po(A)(/)| 

<E / A%(E\ A cJ(E(A a )-l) E(da) 
Jx 

+E / \[hf(E\ A c)-h f (Z\ A g + 5 a )]-[hf(E a \ A c a )-hf(E a \ A c a + S a )]\ X(da) 
Jx 

+E f A%(E\ A cjE(A a ) X(da), 
Jx 



where E a is the Palm process ofE at location a G X iKallenbera . \198a . Chapter 10). 



4~A\ ( 



sec 



Chen and Xia 



Barbour and Brown 
(200j) for full 



The err o r bounds for Poisson process a pproximation l i ke Th eorem 
(|l992ah . iBarbour and Brownl (|l992bh . barbour et all (Il998h . and 
details) pivot on the estimates of Ahf and A 2 hf. The following proposition summarizes these 
estimates for c?2- 

Proposition 4.B. Let 

Ah(S;a) = h(£ + 8 a )-h(Z), 
A 2 h(Z;a,p)=h(Z + 6 a + 5p)-h(Z + 5 a )-hte + 6 l3 ) + h(0, te%a,peX; 

then for each d\-Lipschitz function f , we have 

0.95 + ln+A l-e^l AA ] 



\Ahf(£;a)\ < min < 1, 



A 



lei a a 



\A 2 h f (£; a,P)\ < min jo.75, ^ - , + ~, ^^1{a>i.76} + 0-75 1 {A <1.76} 



(4.4) 
(4.5) 



i l_ e o 
where n 



:= 1 and A = X(X). 



Proof. For convenience, we write |£| = n and |Z^(t)| = Z^(t). Let t\ and T2 be independent 
exponential random variables with mean 1 which are also independent of Z^; then one can write 

W = + ^i{ n >i}i ^+^(*) = + ^ijraM}) 

and Z 5+5Q+5/3 (t) = Z f (t) + M{n>t} + ^l{7a>t}- 
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Hence it follows from (|4.3I) and the di-Lipschitz property of / that 



\Ah f (Z;a) 



e-*E[/(Z f (t) + 5 a ) - /(Z € (t))] eft 



< 



< 



e~*E 



Zf(t) + 1 



eft 



(4.6) 
(4.7) 



e _< di = 1. 



Also, 



\A?h f (Z;a,P)\ 



poo 

/ e- 2t E[f(Z^t) +5 a + S p ) - /(Z e (t) + <5 a ) - /(Z e (t) + fy) + /(Z € (t))] tft 
J o 



< 



e~ 2 *E 



+ 



Z € (t) + 2 Z € (t) + 1_ 



di 



< 1.5 / e~ 2t dt = 0.75 



(4.8) 
(4.9) 
(4.10) 



o 



However, since has constant immigration rate A and unit per capita death rate, it is possible 
to write 

Z e (t) =Z (t) + B f (t), 

where is a pure death process with unit per capita death rate independent of Zq. Direct 
verification gives that Z$(t) follows the Poisson distribution with mean At := A(l — e - *), while 
|D e (t)| follows Bi(|£|,e-*). Hence 



E- 



1 



< E- 



1 



1 - e 



Z^t) + 1 " Z 9 (t) + 1 



(4.11) 



E— \ = f Ex z «W dx = f [1 - e-*(l - x)] n e" At(1 - x) dx 



< 



^ g— nAA 



- (rl e-*+A0(l- a! ) dx < / e -(nAA)(l-x) d = ( 4 .1 2 ) 

~ ' n A A 



and similarly, 

E— i — - = / Ex z «W +1 dx = f x[l - e-\l - x)] n e- Xt{l - x) dx 
Zf{t) + 2 Jo Jo 



< C xe~ (wAA)(1 - x) dx = — !— - 7 \ . „ (1 - e~ nAX ). (4.13) 
~ 7o n A A (n A A) 2 ; v ' 

The claim 

, A , 0.95 + ln + A 

|AM£;«)I< x ( } 

is obvious for A < 0.95 as the right hand side is already greater than 1, so it remains to show 
P~T4"j) for A > 0.95. Combining (|4T7|) and (|4TTT|) . with s = 1 - e~*, we get 

f°° 1 — (= _At f 1 1 — p~ Xs 1 /V~ A \ 

|M /K;a) |< / -J—*- / -55-*^ - + lnA + 7 , 



9 



where 7 is the Euler constant and the last inequality is due to items 5.1.39 and 5.1.19 of 



Abramowitz and Stegunl (|l972l ). For 0.95 < A < 1, ^ + In A + 7 < e -1 + 7 < 0.95 since 



— h In A + 7 is increasing for A > 0.95, and for A > 1, ^ — h 7 < e 1 + 7 < 0.95 because the 
function ^— + 7 is decreasing, completing the proof of (|4.14|) . The last claim in (|4.4|) is easily 
obtained from (H7|) and (fl~l"2l . 

We then apply $~Ub and KW\ in to obtain 

|A»*,(6«.ffl| < £L | 2 _ e — _ _ .-«)} (4,5) 

Now, we show that 

\A 2 h f (Z;a,P)\ <i^- + I. (4.17) 

1 1 71 + 1 A 

As a matter of fact, by (|4.10|) and (|4.16|) . (|4.17|) clearly holds for n = and n > A, hence it 
remains to show (I4.17|) for 1 < n < A. Using (|4,15p . it suffices to prove that 

°-^±V ( 2 - e - - 1(1 - e-)l < 1.09. (4.18) 
n [ n J 

However, for n > 12, 

0.5(n + l) f „ 1, „.1 n+1 13 

— V ; <^ 2 - e" n - -(1 - e~ n ^ < —— < — < 1.09 
n \ n y ' ) n ~ 12 

while for 1 < n < 11, one can verify (|4.18p for each value of n. 
Finally, we prove 

I A 2 ^ (£;a,/?)| < ^1 {A > 176} + .751 {A<1 . 76} . (4.19) 

The claim (|4.19|) is evident for A < 1.76, so we assume A > 1.76. On the other hand, if Y follows 
Po(i^), then 

tm 1 tm f 1 1 1 v - 1 + e" 1 ' 



y + 2 If + i (y + i)(y + 2) 

Therefore, 

e|t^ 7+ „ ,.t . » i> <E<i— ^ - + 



2A t - 1 + (1 - At)e" 



Z ( (t) + 1 Z^t) + 2j~ \Z 9 (t) + l Z (t) + 2J A 
which, together with (|4.9p . ensures that 



|A'ft,(fca,ffi|< / ° J ^ 2A '- 1 + ' 1 2 - A ' )e ' A ' ^ 
A t 

1 ,2As-l + (l-As)e~ As , 
(1 - a) ^ ds 

3 2(1 -e~ A ) /2 1 \ f x 1-e-* , 
^ 3 2(1 -e~ A ) /2 1 \ fe~ x , . 

= - X + ^ + ^T+ X + ^ (lnA + 7 )=:a(A), 
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where the first equality is by t he change of variable s = 1 — e~* and the last inequality is 
from items 5.1.39 and 5.1.19 of lAbramowitz and Stegunl ()1972l ). Now, 6(A) := a(A)A — 2 In A 
is decreasing in A for A > 1 and 6(1.76) < 0, which implies that a(A) < 2 -y^ for A > 1.76, 
completing the proof of (|4.19p . □ 



The following counter-example, adapted from lBrown and Xial ()1995bl ). shows that the loga- 
rithmic factors in (14. 4D and (14. 5D can not be removed. 

Example. Let X = {0, 1} with metric do(x,y) = \x — y\, let A satisfy A{1} = 1 and A{0} = 
A — 1 > 0, and define a cfi-Lipschitz function on 91 as 




if e{i} = o, 
if e{i} > o. 



Using the fact that Z (i){O} follows Po((A - 1)(1 - e~*)) and Z (t){l} follows Po(l - e _t ), we 
have from (|4.6|) and (|4.8|) that, as A — ► oo, 



\Aht 



Z (t){O} + l 



7 . (£){i} = o] 



l 



,-(A-l)(l-e-*) 



(A-l)(l-e~*) 



-C 1 --*) dt 



1 



-(A-l)s 



(A-l> 



-e s ds (where s = 1 — e ) 



> 



A-l 



1 



and 



|A 2 /i(0;l,l) 



A-l 



e" 2t E 



(in 



it 



In A 



Z (t){O} + 1 



[Z (t){l} = 0] dt 



-2t 



1 



-(A-^Cl-e- 4 ) 



(1 

-1 



(A-l)(l-e-*) 

. 1 - e-^-V* 



dt 



(A-l)s 



e s ds (where s = 1 — e ) 



> 



A-l 



A-l 



1 



U 



A-l 



1 



du 



In A 

"X' 



□ 



As noted before, d\ is the same as d\ when the point patterns have the same number of 
points while it is smoother than d\ when the point patterns do not have the same number of 
points. On the other hand, for any two point processes 3 and H on X , we have 



Edi(S,H) =E(di(S,H) | |S| = |H|)P[|H| = |H|] +P[|H| + |H|]. 



(4.20) 



When we consider P[|S| ^ |H|], which corresponds to the total variation distance between the 
distrib utions of the total number of points of the two point processes (see iBarbour and Brownl . 



1992bl ). there is no such logarithmic component in Stein's factor, which means that the logarith- 



mic component in d\ was brought in only by the discrepancies of locations of points when the 
point patterns have the same number of points. However, this problem is shared by d\, that is, 
the Stein factors for d\ will inevitably have the logarithmic component as well. 
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It is also worthwhile to note that, since Edi(S, H) replaces the term P[|E| 7^ |H|] in (14.200 with 
a smaller Edi?(|S|, |H|), we would expect a bigger improvement on bounding d 2 («£? (H), J§f(H)) 
when P[|H| 7^ |H|] is "dominant" at the right hand side of (|4.20p under the best coupling. Such 
an improvement is obtained in the next two subsections. 



4.2 Poisson process approximation of a Bernoulli process 



Let X = [0,1] with do(x,y) = \x — y\, and let X\, . . . ,X n be independent and identically 
distributed Bernoulli random variables with P[Xi = 1] = p. Then 3 = Y^i=i-^-i^i/n defines a 
Bernoulli process on X. If we let To, T\, . . . , T n be independent and identically distributed 
uniform random variables on X which are also independent of {X\, . . . , X n }, then 

n 

i=i 



defines a binomial process on X (jReissl . Il993l . p. 29). By IXia and Zhang (120071 ) 



«^ ( H)^(y,)<(i- + !)A-J=. 



(4.21) 



To estimate c^O^OO, Po(A)) with \(dx) = npdx, we employ Stein's method for Poisson process 
approximation. As a matter of fact, it follows from (|4.ip that 

EAh(Y) =^(^J [K Y + S *) ~ h (Y)) Mda) + J (h{Y - S a ) - h(Y)) Y(da) 

n 

= n P K(h(Y + 5 To )-h(Y))+Y,^{Hy i )-Hy i + ^T 1 ))p 

i=l 

= npE{ {h(Y + S To )- h(Y)) - {KY 1 + 8 To ) - h{Y 1 ))}, 
where Y l = Y — JQ^. Define 

g(i) = E{h(Y + S To ) - h(Y) I \Y\ = i) = E^^fy) - M£}=i<^ 
then 

\EAh(Y)\ = np \E(g(\Y\) - g^)) \ = np 2 [E^Y^ + 1) - gdY^)) \ 
< 2np 2 \\g\\d TV {J?(\Y 1 \),J?(\Y 1 \+l)), 

where ||-|| denotes the supremum norm and, for any two nonnegative integer- valued random 
variables U\ and U 2 , 

dTv{&{Ux),J?{U 2 )) :=\ sup \E~g{p x ) -Eg{U 2 )\. 

1 g:Z+ -»[-!,!] 



On the other hand, by Lemma 1 in lBarbour and Jensen! (|1989j), 
d TV {^{\Y l \),^{\Y l \ + l)) < max P[|F X | = i] < 1 A 



0<i<n-l 



2 v /(ra-l)p(l-p) 
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and using (|4.4p . we have, for / G !F 2 , that 



, , , (0.95 + \n + (np))p 

EAh f (Y) < V F " = , 

^vV(n-lMl-p) 



which implies from (|4,2|) that 

<f 2 (^(y),Po(A)) = sup pM* /( ni < (4 . 22) 

Now, collecting ([OTj) and (|4T22|) gives 
Theorem 4.C. With the above setup, we have 

4(*(s), po(a)) < C 2. + g 1 ) a » + ( a95 + 



2n 2/ ^/37^ i v - l)p(l -p)" 



Remark. An immediate message from Theorem I4.CI is that, if n is large, it is almost impossible 
to distinguish between the distributions of the two processes. It is quite a contrast to the 
conclusion under d\ where it is essential to have a ver y small p as well as a large n to ensure 



a valid Poisson process approximation (see IXial . 1 1 99 71 ) . In practice, statisticians would use a 



Poisson process rather than a Bernoulli process when n is large, confirming our conclusion 
under d\. 

Remark. It is a tantalizing problem to remove the ln + A term in the upper bound. We conjec- 
ture that, at the cost of more complexity, the actual bound should be of order {k+p)/ (1 Vy / np). 



4.3 Point processes of i.i.d. points 

Let H := Yli=i ^X, and H := Y2%=i $Yii where M and N are integer- valued random variables, 
(Xj)jgN is a sequence of i.i.d. <Y-valued random elements that is independent of M, and 
is a sequence of i.i.d. ^-valued random elements that is independent of N. Denote by dw the 
Wasserstein metric between random elements of X with respect to do. 

Proposition 4.D. We have 

max(d RW (^(M),Sf(N)),c 1 d w (^'(X 1 ),^(Y 1 ))) 

< d 2 (jSf(H),jSf(H)) < d RW {Jt?(M),J?(N)) +c 2 dw[&(X 1 ),J?(Y 1 )), 

where 

ci = ci(Sf(M),Sf(N)) = max(P[Af > 0},P[N > 0]) 

and 

c 2 = c 2 [j?(M),S?(N)) = E (^^1 {A ^ >0} ) < min(P[Af > 0],P[JV > 0]) 
for random variables M and N that are coupled so that Ec?ij(M,iV") = djiw (Af) , «£? (iV)) . 
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Remark 4.E. An interesting special case is given if 3 and H are Poisson processes. For finite 
measures n and v on X, we obtain from Proposition 14. Dl that 

d 2 (Po(/x),Po(i/)) <^^ + (l-e-^)d w { l i l /^u/p), 
' n V v 

which is an improvement by a factor of order 1/-^/// V v for p,, v — » oo in the firs t summand when 



compared to a corresponding cfo-bound (see for example lBrown and Xial . ll995al . Equation (2.8)). 
Estimation of the d^w-term was achieved by considering a Poisson process Z on M + with 
intensity 1 and defining a coupling pair by M := Z((0,fj]) and iV := Z((0,z/]). 
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Proof of Proposition 4-D- Upper bound: Let M = M and N = N be coupled according to 

\M-N\ -. _ _ \ 
AlVN L {MVN>0}) 



a dw-coupling, so that E( ^ ^ 1 (mvtv>o>) = d RW {^(M), 3?(N)) , and let Xi = X { and 



Yi = Yi with ¥,do(Xi,Yi) = dyj {££ (X\) {Y\)) for every i G N in such a way that the pairs 
(M,N), (Xi,Yx), (X 2 ,i^), ... are independent. We then obtain 

rf 2 (Jf(=),Jf(H)) SEJ^ES.Sx,. Ef.^f,) 

which, by the independence between (M,N) and {(Xi,Yi), i > 1}, and the assumptions on the 
distributions of those pairs, yields the upper bound claimed. 

The bound for the factor c 2 follows from E(j£^l {Alvf[>0} ) < F[M > 0,N > 0] and 
P[M = N = 0] = min(P[M = 0],F[N = 0]) , the proof of which is straightforward. 

Lower bound: Let '■= {g ■ X — > [0,1]; |^(;c) — < do(x,y) for all x,y £ X}, and 

let 5 G be a mapping with E<?(Xi) — E^(ii) = dw(j£f(.Xi),«£f (Yi)). Such a mapping 
exists by dw[-^ (Xi), (^i)) = sup ge7 r* |E<7(Xi ) — E#(Y"i)|, where the supremum is attained 
because is a compact subset of C(X,M) by the Arzela-Ascoli theorem and the mapping 
[g i — ^ Eg(Xi) — Egf(Yi) |] is continuous (both statements are with respect to the topology of 
uniform convergence). 

Define / : «tt -» [0,1] by /(£) := ^ /^(ar) for £ G 91 \ {0} and /(0) := Eg(Xi). 

We next check that / G ^-"2- It is immediately clear that — /(0)| < 1 = di(£,0) if 

£ G 9T \ {0}. Let then £ = (5^ and 7/ = X)?=i ^% both be in Dl \ {0}, where we assume 

without loss of generality that m < n and /(£) > f(rj) (otherwise interchange £ and rj and/or 
replace g by 1 — g G ^y), and that the points are numbered according to a di-pairing such 
that \{Ya=i d a{ x iiVi) + (n- m)) = d\(£,rj). Let k G argmax 1 < i < m g(x i ), and x { := x k for 
m + 1 < i < n, which implies 



1 m 1 71 

|/(6-/('7)| = -E^i)--E^) 

8=1 1=1 
-y n \ n 

n ^ — * n — * 



n ' — ' 71 

i=l 

1 " 

< - y^d (xi,yi 
i=i 

<di(M> 
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and therefore / E Ti- 

Choose pairs (M,N), (Xi,Yi), (^2,12), ... in the same way as for the proof of the upper 
bound (although the coupling of X{ and % in each of the pairs is not important now). We 
obtain 



*(jSf(S),JSfCH)) > |E/(E£i**J -E/(Ef=i^)| 

f , M N v 

«=i 7=1 ' 



{M>0,iV>0} 

./ 

M 



+ E {(^E^)- E ^i)) 1 {^/ n.A.u; 
+ E j (Wo - i E 3C%)) 1 { m=o,tv>o} 



(E5(Xi) - Eff(y x )) P[Af > 0, N > 0] 

+ (E^(Xi) - Eg(Yl)) P[M = 0, JV > 0] 

= d w (jSf(Xi),jgf(yi))p[Jv>o]. 

Since the above argument is symmetric in 3 and H, we obtain the lower bound when combining 
it with Proposition !2.C( ii). □ 

5 A statistical application 

In order to show the potential of d% and c?2 in statistical applications, we propose a test procedure 
based on these two metrics. Suppose that our data consists of a few i.i.d. realizations of a point 
process 3, and we would like to test if 3 ~ P for a certain probability measure P on 91. Such 
multiple point pattern data may arise, among other examples, from recording degenerate cells 
in tissue samples or plants in a large population that is sampled only via a few comparatively 
small windows. 

In what follows, we restrict our attention to a test for spatial homogeneity under the as- 
sumption that 3 is a Poisson process on W = [0, l] 2 with unknown expectation measure A. 
This limits the alternative hypothesis sufficiently to keep our simulation study within the scope 
of this article. Suppose that £i,...,£at are realizations of i.i.d. copies 3i,...,Sjv of 3 and 
that the total mass A := A([0, l] 2 ) of the expectation measure A is known (otherwise we just 
take the canonical estimate 17 X^=il&l)' Our nun hypothesis is then 3 ~ Po(ALeb 2 ). Write 
Pjv := jj Ej = i <% E ^P(9I) for the empirical distribution of our data. We perform a Monte Carlo 
test where the test statistic would ideally be 

T(£!,...,6v) := J2(^,Po(ALeb 2 )), (5.1) 

but since this is computationally intractable, we replace it by the randomized test statistic 

r(£i,...,£jv;??i,---,r?jv) ■= cIq(Pn,Qn), (5-2) 

where Qn '■= YliLi ^ or realizations rji of Po(ALeb 2 )-processes Hj that are independent 
amongst each other and of the 3j. The null hypothesis is rejected at significance level a = 0.05 
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Figure 5.1: Left hand side: normalized intensity functions f K plotted against their first coordinate; right 
hand side: six independent realizations from Po(/2-Leb 2 ). 



if T(£i, . . . , ^jvj'7l> • • • tVn) ranks among the five highest values when pooled with 99 simula- 
tions of T(fji, . . . , fjN] 771, ... , tjn), where fji, . . . , fjw, 771, . . . , ryjv are all independent Po(ALeb 2 )- 
realizations. 

We choose N = 12, A = 30 for the simulation study, which is both realistic for actual 
data and keeps computation times at a tolerable level. One single test of two series of 12 
point patterns takes less than three seconds (given the simulated null hypothesis distribution ) 



on an ordinary laptop computer using the library spatstat (see iBaddeley and Turnerl . 120051 ) 



that supplies t ools for the analysis of spat i al po int patterns within the statistical computing 



environment R (|R Development Core Teaml . 12007). Increasing either iV to 50 or A to 110 while 



keeping the other parameter fixed, still keeps the computation time well under one minute. Note 
that the optimal point assignments needed for computing d%, and also di between empirical 
measures, can be found efficiently (in 0((m V n) 3 ) steps, where m and n are the cardinalities 
of the point patterns) by us i ng th e so-called Hungarian method from linear programming (see 



Papadimitriou and Steiglita . Il998l . Section 11.2). 

Table l5~T1 summarizes the results of our simulations. The first column gives the Monte Carlo 
powers of our test against Po(A/ K (x, y)Leb 2 (d(x, y))) -alternatives, where 

Kexp(-Kx) 



1 — exp(— k) 



for x, y G [0, 1] and k = 1, . . . , 4. See Figure 15.11 to obtain an impression of the corresponding 
distributions. By Monte Carlo power we mean the fraction of the number of rejected tests in 
100 independent simulations of the alternative. 

For many applications it would be desirable to generalize d\ by introducing an order param- 
eter p > 1 and a cut-off value c > 0, which leads to the definition of 



1 / m 

di P ' C) (£,v) ■= -( mij 1 min ( c > d (xi, y 7T u))) P + c p (n - m) 
x i=i 



i/p 



for £ = Yli=i fixnV = Sj=i fiyj ^ ^ with n > max(m, 1), where do of course no longer needs 
to be bounded and is just taken to be the Euclidean metric here. We stick to the case p = 1, 
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but give in the second column of Table [57T1 the corresponding Monte Carlo powers if the cut-off 
is chosen to be c = 0.3 instead of 1, so that our test now puts less emphasis on cardinalities 
and more emphasis on positional differences in the compared point patterns than it did before. 
There is no strong reason for choosing exactly c = 0.3; the value reflects the somewhat vague 
idea that in an optimal pairing of about 30 points each, the pairing distances are "usually" 
still below 0.3. As one can see from Table 15. 1\ the power improvement is very noticeable, and 
especially this second test detects the inhomogeneity quite well even if they are not very clearly 
visible by eyeball observation of the simulated data. 

For comparison we have also added the results of the corresponding tests if d\ is replaced 
by d\. Since there is typically a wide range of values for the cardinalities of realizations of a 
Poisson process with 30 expected points, and since differing cardinalities are not appropriately 
addressed by di, these tests perform very poorly (for c = 0.3, powers seem to lift off from k = 9 
on). 



K 


d\ , c = 1 


d±, c = 0.3 


d±, c = 1 


di, c = 0.3 


1 


0.10 


0.23 


0.08 


0.02 


2 


0.41 


0.97 


0.12 


0.06 


3 


0.93 


1.00 


0.06 


0.04 


4 


1.00 


1.00 


0.10 


0.10 



Table 5.1: Powers of the tests for two different cut-off values c against increasingly conspicuous alter- 
natives. The last two columns give the corresponding results when the test is based on the metric d\ 
instead of d\ and are listed for comparison only. 

In summary, the above procedure is rather successful for testing spatial homogeneity from 
multiple point patterns. We also have obtained promising first results when testing for spatial 
dependence, but a more extensive further study will be necessary in order to establish the 
possibilities and limitations of this test procedure and of tests or other statistical analyses based 
on the di -metric in general. 

Appendix: proofs left out in the main text 

Proof of Proposition UTA[ From the definition it is clear that < g?i < 1, that di(£,r)) = 
if and only if £ = rj and that di(£,rj) = di(rj,^). To show the triangle inequality let £ = 
Yh=i $xi i V = Z)j=i fiyj > C = Ylk=i $z k G ^> and add two points U\ and u 2 to X, extending d by 
do(ui, ^2) := do(ui,u) := do(u2,u) := 1 for every u G X. 
Note that for I = m = n it is straightforward to see that 

n n n 

min V d (xi,y n(i) ) < min V d (x h z^) + min ^doi^y^)), (A.l) 

1=1 1=1 1=1 

which is the essential step in proving the triangle inequality for d\. 

We now prove that di(£,rj) < c?i(£,C) + ^iXC 7 /)) assuming that at most one of the point 
patterns is empty (otherwise the relation is clearly satisfied). Since this inequality is symmetric 
in £ and 77, we assume without loss of generality that I < m in what follows. We show two 
separate cases. 
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Case 1, m < n: Let Xi := u\ for I + 1 < i < n and yj := U2 for m + 1 < j < n, and write 
£ := ^ and »7 := Sj=i • We tnen nave 



V) < diit, fj) = r?) < + di(C> v) = <*i(f , C) + di(C, 



(A.2) 



using that a < m implies — < 



a a+n—m 



for the first inequality, and (jA.ip for the second inequality. 



m — n 

Case 2, m > n: Let X{ := Zk '■= U\ for Z + 1 < i < I V n and n+l</c</Vn, and 

choose y[,... y' lVn in such a way that S y >, < YJj=i S yj and min 7ren ;vn Ei=i rf o(^i, y^ (i) ) = 
mhvgn™ E!=i do(^, ^(i))- We have 

- / IVn \ 

di(£,v) = — ( min V d Q {x u y n u)) + {m - (Z V n)) I 
m \7ren m — • / 

1 / ZVn \ 

< — ™in 53do(a?i,i4 (i) ) + (m-(iVn))) 

1=1 ' 

- /■ ZVn ZVn \ 

< — ( mm ^2do{xi, Z7T{i) ) + min ^ d (^, y^) + (m - (Z V «)) ) 

1=1 1=1 

- ✓ ZVn n v 

= ™ ( ?^ in Yl d ° ( Xi > + ™A n fir(i)) + (™-n)) 

171 \7rGll !Vn — ■* neilm — 7 / 

x 1=1 1=1 ' 



<di(e,0 + di(C^), 
where we used (jA.ip for the second inequality. 



□ 



Proof of Proposition \2.Bl Statement (i) is straightforward from the definitions of cZr, d\ and d\. 

Statement (ii). Proposition 4.2 in IXial (|2005l ) states that £ n — » £ vaguely if and only if 
dl(£nj - ^ as n — » oo; so all we need to show is that the latter is equivalent to di(£ n , £) — > 0. 

If cZi(£ n ,£) — ► 0, we have by (i) that dij(|£ n |, |£|) — > 0, from which it is easily seen that 

~ l£l> i- e - there is an no £ N such that = |£| and hence di(£ n ,£) = di(£ n ,£) for every 
n > riQ. Thus di(£n>£) — * 0. The converse direction follows immediately from d\ < d\. 

Statement (Hi). The local compactness and sepa rabil i ty pr operties depend only on the 
generated topology. See for example Proposition 4.3 in IXial (120051 ) for the proof. Note that, by 
the compactness of X, the sets 91; := {£ G 9t; |£| = Z} are compact for all Z G Z + . 

It remains to show the completeness. Let (£ ra )ngN be a di-Cauchy sequence in 91. It is 
straightforward to see that this implies the existence of an no G N such that \£ n \ = \£ m \ for 
every n,m > no, which means that there is an Z G Z + such that the tail of (£ n )ngN is a Cauchy 
sequence in 91;. By the compactness of 91; this tail converges. □ 



Proof of Proposition \2. CI Statement (i) is an immediate consequence of the Kan torovich-Rubin - 
stein theorem, where the minimum is attained, because (91, di) is complete. See [Dudley] (j 19891 ). 
Section 11.8, for details. Statement (ii) follow s by tak i ng ex pectations and minima in Propo- 
sition [2TB](i) . The last statement follows from iDudlevi ([1989), Theorem 11.3.3, using Proposi- 
tion [2TB](ii) and noting that d^ is an instance of Dudley's /3-metric. □ 
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